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Abstract 

We propose a weighting scheme for the proposals within Markov chain Monte Carlo al¬ 
gorithms and show how this can improve statistical efficiency at no extra computational 
cost. These methods are most powerful when combined with multi-proposal MCMC 
algorithms such as multiple-try Metropolis, which can efficiently exploit modern com¬ 
puter architectures with large numbers of cores. The locally weighted Markov chain 
Monte Carlo method also improves upon a partial parallelization of the Metropolis- 
Hastings algorithm via Rao-Blackwellization. We derive the effective sample size of 
the output of our algorithm and show how to estimate this in practice. Illustrations 
and examples of the method are given and the algorithm is compared in theory and 
applications with existing methods. 

Keywords: Weighted samples; Markov chain Monte Carlo; Rao-Blackwellization; Par¬ 
allel computation; Simulation. 
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1 Introduction 


Monte Carlo methods have become invaluable tools for solving demanding computational 
problems in a wide variety of scientific disciplines. In this paper, we propose weighting 
schemes for Markov chain Monte Carlo (MCMC) methods, where the main computational 
step often can be implemented using modern computer architectures with large numbers of 
cores. Since the weighting occurs within each iteration, we call this method locally weighted 
Markov chain Monte Carlo (LWMCMC). 

We will show that by allowing the points proposed in an MCMC algorithm, even those re¬ 
jected, to take on weights, we can often improve statistical efficiency. The usual MCMC 
algorithms arise as special cases under specihc weighting and proposal schemes in the frame¬ 
work we define. A measure of effective sample size {ESS) for this new class of algorithms is 
derived and shown to have natural connections to the existing measure of ESS for MCMC 


(Kass et al. 

P)').s 

[Liu 

2001 

p. 126). LWMCMC improves the parallel Metropolis-Hastings 

method of 

Calderhead 

(2014). We show that our method can be interpreted as a Rao- 


Blackwellization of an extended version of his result. 


To illustrate the idea, we first describe our weighting scheme for the Metropolis-Hastings 
(MH) algorithm (Metropolis et ah, 1953 Hastings, 1970). To facilitate later discussion, our 
exposition of the algorithm differs sligthly from the standard description. Let the target 
density tt be defined on a sample space S. The proposal kernel K{dxi; x) is a measure on S', 
with corresponding density k{xi]x). Set as the initial value and let j = 1. To estimate 
k'h = fs h(x)7r(x)dx, the MH algorithm iterates: 


1. Draw a proposal Xi'^ from K{dxi, Xq''). 

2. Calculate 

r{x^i'^]X^P) = min 


^ 7i{x[^'')k{xQ^;x[^'^)^ 
7r{xQ'^)k{xi^]XQ^) j 


3. Set y = Xi'* with probability r{xi'’]XQ^) and y = x^^ with probability 1 — r{x^{''^; Xq'^) . 
4- Set Xq^^'^ = y, set j = j + 1 and go to step 1 until j = n. 

5. Estimate yh with ^ 

In this paper, we propose giving both Xq ^ and x^^ weights w{xq^) and wlxi'^) for each j 
and substitute step with the new LWMCMC estimator 


^ n 1 

fih = - ^^w{x^f^)h{x^f^). 

^ j=l i=0 


For instance, taking = 1—and w{x'"/^) = r(x['^^; x[/^) results in an unbiased 

fih which often has lower variance than the standard MH estimator. We will primarily be 
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looking at two weighting schemes that give unbiased estimators, of which the aforementioned 
is the hrst version. Version 2 uses the weights 



n{xQ"^)k{xf’]XQ 


OlJj)) 




(i) . ^(j)) 



7r(xj^^)k(xQ^; x^{ 


(J). 




U) . 


One way to systematically construct weighting schemes that result in unbiased estimators is 
by noting that step is a move from Xq'^ in a hnite state Markov chain on dehned 

by the transition matrix 


P = 


(j) (j) (j) (j) 

l — r{xY ,XQ ) r{x\ ;Xq 


r{x, 


U). 
0 ) 


™(i) 

Xi 


) 1 — r{xt 


U). ™0')\ 

, J-i ) 


and that substituting step with setting y = x^p with probability P 12 and y = Xq'^ with 
probaility PA for any z/ > 1 leaves tt invariant. Here, P^ is the (i,j) entry of the ui\i 

power of P. Hence, taking the weights w{xq'^) = PA and w{xi'^) = PA in f^h results in an 
unbiased estimator. In particular, version 2 of the weights arises from taking z/ —)■ cx) which 
corresponds to the stationary distribution of the Markov chain dehned by P. It is easy to 
show that the version 2 weights satisfy = w^^^P, where = {^(xq ), They 


also appear in the acceptance-rejection rule of Barker (1965). 


Moreover, the usual MH algorithm itself can be viewed as producing locally weighted samples, 
in which the accepted point gets weight 1 and the rejected gets weight 0 in step|^ The gain in 
choosing other weighting schemes stems partly from the reduction in variation of the weights. 
The weighting scheme used to produce jlh can be chosen independently of the probability 
vector used to propagate the chain. Section [^proves the unbiasedness of jlh constructed the 
way outlined above for a generalized version of the MH algorithm. 


In section we show how to compute the effective sample size {ESS) for a given chain 
and weighting scheme. Applying this measure to n = 10, 000 samples obtained using the 
MH algorithm targeting a two-dimensional standard normal distribution using a normal 
proposal kernel with covariance 1.2^12, in combination with the usual MH, z/ = 1 and z/ —)■ cx) 
weights, gives ESS = 1,189 and 1, 359 and 1, 337 respectively. Since this is simply using 
the exact same samples weighted in three different ways, it shows we can trivially improve 
the estimation procedure by using locally weighted samples. The proposal covariance was 
tuned to give an acceptance rate of roughly 50% as recommended by Roberts et al. (1997). 


The rest of this paper has four sections. In section we detail the main idea in a wider 
context. Section gives a way of computing the algorithm’s ESS. Section provides 
illustrations of the method. Section concludes, while the appendix contains the relevant 
proofs. 
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2 Locally weighted Markov chain Monte Carlo 


2.1 Main Idea 

We begin by giving a more general algorithm than the above example. It natnrally extends 


mnlti-proposal MCMC algorithms snch as the mnltiple-try Metropolis (MTM) of Lin et ah 


(2000), and can beat snch methods by not discarding potentially nsefnl information available 


in the MCMC. By making mnltiple proposals within each iteration, MTM allows the nse 
of transition kernels corresponding to large searching regions, hence mediating the “conflict 
of interest” between the desired step size and desired acceptance rate that arises in MH. 


According to a sampling rule on the set of proposals (Liu et ah, 2000 Qin and Liu, 2001) 


MTM chooses one candidate point for an acceptance-rejection step. Typically, this is a 
“good” point which is often accepted. In addition to performing such an acceptance-rejection 
step, we also advocate using the available weights to give estimators with smaller variance 
than the standard MCMC estimator. 


As before, we wish to sample from the target distribution vr, known up to a normalizing 
constant, dehned on a sample space S. Let K{dxi,... ,dxM]x) denote a one-to-M kernel, 
dehned as a probability measure on with density function k{xi,, xm', x). K summa¬ 
rizes the proposal generation process, and allows for dependency and deterministic relations 
among the proposed points. Dehne a (M -|- l)-to-one kernel T{dy;xo,Xi,... ,xm), which is 
a probability measure on S. The restriction on T is that it has to leave tt invariant, but can 
be taken to be any acceptance-rejection step or sampling rule that satishes this. Examples 
are given in sections 2^ and 


Algorithm 2.1 (Locally weighted MCMC). Set to be the initial value and set j = 1. 
Collect points : i = 0,..., M; j = 1,... n} and weights {w{xl^^) : i = 0,..., M; j = 
1,... n} to estimate yu according to the steps: 

1. Draw proposals ... ,x^^} from K{dxi,... ,dxM', . 

2. Calculate and store the weights w{x^p) according to a weighting scheme chosen by the 
user, e.g. (but not restricted to): 


Version 1: 

Version 2: 


wix^'P) = 


7r{xl^^)k{x^fl-,xl^^) 




LA Ci)' 


, for i = 0,..., M, 


nnhprp _ / L) u; UJ UJ'i 


(i) Aj) 


Ah 


4 














3. Draw y from T{dy] 

4- Set = y and j = 
5. Estimate fih with jlh = 


5-^1 ; 


Di)) 
) ) 


j + I, and go to step ^ until 


j = n. 


The weights are normalised within each iteration, such that 1 J- 

Version 1 corresponds to the multi-proposal version of the i/ = 1 weights from section 
Likewise, version 2 uses the analogous v ^ oo weights. Other weighting schemes, e.g. for 
z/ > 2, are potentially also useful, but typically require more computation. 


2.2 Choice of the propagation kernel T 

T can be taken as any acceptance-rejection step or resampling rule that leaves vr invariant. 
For instance, T can be taken independent of the set of proposals {xi,..., xm} generated by 
K. The class of such T’s essentially contains all MCMC methods. Examples include sampling 
y according to a standard MH or Gibbs step from x, for which T{dy,xo,xi,... ,xm) = 
T{dy;xo). 


Perhaps more useful is to allow T to use information about the set {xi ,..., xm}- E.g. letting 


T{dy;xo,xi, ... ,xm) = YP=o'^{^i)^Xi{dy)-, where 5^ denotes the Dirac delta function, we 
encompass Calderhead’s algorithm (see the appendix). T allows us to store weights according 
to one weighting scheme, but propagate using another. Moreover, the algorithm also has 
flexibility to use MTM rules on {xi^... ,xm} to choose a “good” point to move to. An 


explicit example is given in 4.2 


2.3 Properties 


Theorem 2.1. The estimators produced by versions 1 and 2 of Algorithm \2.1 are unbiased 
for pih for any measurable h. 


The proof is given in the appendix. It relies on Algorithm A.l, which encodes the weights 


arising in Algorithm |2.1 with an empirical distribution. Algorithm A.l is itself a generaliza¬ 
tion of Calderhead’s algorithm, introducing the flexibility of T in the propagation step. The 


empirical distribution introduces Monte Carlo error, which is the subject of Theorem 2.2 


Theorem 2.2. Given the same weighting scheme, Algorithm 2.1 is a Rao-Blackwellization 


of Algorithm A.l 


In the special case of T being Calderhead’s propagation rule. Algorithm |2.1 is a Rao- 
Blackwellization of Calderhead’s algorithm. Again, the proof is given in the appendix. 
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3 Effective Sample Size of LWMCMC 


To evaluate LWMCMC, we derive a measure of effective sample size. If jl is the standard 
estimator of the mean based on the samples and their weights, and is the variance of 
TT, ESS is dehned ESS = (T^/Var(/i). Recall that the output of Algorithm 2.1 is n{M + 1) 
weighted samples, producing the mean estimate 


n - at 

fi= — where = '^^w{x^p)x^p. 

^ j=l i=0 


Proposition 3.1. The ESS for samples and weights on the form produced by Algorithm 2.1 
can he written as 

ESS = 


n 


Var(a:) 


(1 + 2E.7») 


where 'yk is the lag-k autocorrelation function of and Var(a:^-^)) = Var(a:) for all j 

by stationarity. 


The proof is given in the appendix. In the case of the usual MH and its multi-proposal 
extensions, one always takes w{xq^) = 1 and w{x'f"') = 0 for i > 1. Then x^^'^ = Xq^ and 
Xq^ ~ TT for all j assuming the chain has converged, so that Var(a:*'-^^) = Var(a;) = for all 
j. Moreover, 7 ^, = where pk is the lag-fc autocorrelation of the Markov chain {xq"'}^^^. So 
the above expression reduces to the standard measure of ESS in the case of MCMC, namely 

ESS = n/{l + 2ZkPk)- 


To estimate ESS for LWMCMC, substit ute 1 -|- 2 7 ^ with an estim ate of the spectral 

density of at frequency 0 (see e.g. Andrews 1991 Muller, 2014), and cx^ and Var(x) 


with their respective moment estimators. 


Given exactly the same propagating chain, LWMCMC beats standard MCMC in terms of 
ESS whenever Var(x)(l-|-2 J2k Ik) < (X^(l-|-2 J2k Pk)) which is an easy condition to check. A 
good kernel K will typically make Var(x) small, while T is important in making 1-1-2 7 ^. 

small. 


4 Numerical example 

We apply Algorithm 2 . 1 | to the two-dimensional conditional density 

, I . f {y — 9z)‘^ [z — 9)‘^ 

'K{z,9\y) oc exp |- — -^— 

which arises under a flat prior on 9 in the indirect observation model 

y = 9z + e, z\9 ^ N{9,1), e~V(0,(T^), e IL {z,9). 
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(See Chen et ah, 2001, for further details). Our interest lies in estimating the mean of 6. 
When a is small most of the density degenerates to lie around the curve y = 6z, making 
sampling particularly hard. Figure 1 shows the contours of the density when a = 0.1 and y 
is observed to be 1. 


4.1 Locally weighted Metropolis-Hastings 

To illustrate the algorithm in a simple case, we hrst implement the very blunt random 
walk MH and its exact LWMCMC counterpart given in section iF is taken to sample 
{z[^\ Oi^) = Xi'^ ~ N{xq \ and T performs the usual MH rejection step. Figures 2 and 
3 show the samples obtained using MH and LWMCMC MH respectively with n = 10, 000. 
The step size of the RW MH was tuned to maximize ESS, corresponding to A = 0.45. 

In Figure 3, the black dots represent the weighted samples produced by K, even those 
rejected in the propagation step. The red dots represent the samples produced by T, i.e. the 
actual propagating samples. Many of the black dots have close to zero weight, and hence 
do not bias our estimate of the mean of 9. ESS was computed to be 195 for MH, 220 for 
LWMCMC with v = 1 and 216 for v ^ oo when using exactly the same propagating chain 
in the three cases, showing only small increased efficiency of LWMCMC in this setting. It is 
clear that neither of these algorithms are well suited to this problem. 


Target density contours 



MH samples 


I 



0.0 0.5 1.0 1.5 2.0 2.5 


LWMCMC MH samples 



Figure 1: Contours of vr. 


Figure 2: RW MH. Figure 3: LWMCMC RW MH. 


4.2 Locally weighted Hamiltonian Monte Carlo 

The real beneht of LWMCMC arises when multiple points are proposed within each iteration. 
We illustrate one such algorithm here, where the leapfrog integration path that arises in 


Hamiltonian Monte Carlo (HMC) (Duane et al., 1987) is taken as the proposals in Algorithm 
2.1 In HMC the state x = {z,9Y' is augmented with an auxiliary momentum variable p. 


The Hamiltonian is dehned 


H{x,p) = — log7r(a;) + -p'^W ^p, 
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where W is symmetric, positive-definite and is chosen by the user, and p has the dimension 
of S. As the system evolves in time it obeys Hamilton’s equations 


dx dH(x,p) 1 dp 


dH{x,p) 

dx 


{Vlog7r(a;)}^, 


by conservation of energy. These equations need to be solved using numerical methods. The 
literature typically favors leapfrog integration, since it is reversible and outperforms Euler 
discretization. The leapfrog algorithm we use iterates 

5 1 

Xt+5/2 = Xt+ -W pt, 

Pt+5 =Pt -5{Vlog7r(a;t)}'^, 

5 tt^-1 

Xt+s = Xt + - W Pt+s- 


Since Xt is typically much cheaper to evaluate than pt, this is of the same order of computa¬ 
tional complexity as the standard leapfrog method. 


Duane et ah (1987) realized the following: If we can simulate from the augmented distribution 


7r*(x,p) oc exp{—H{x,p)} then, marginally, a: ~ vr and 0(p) oc exp(—O.Sp^hE ^p) so that p 
is normally distributed. If x^'^ is the state of the chain at iteration j, the HMC algorithm 
performs the steps 

1. Draw a momentum vector p^^ ~ 0. 

2. Starting from {xq\pq'^), run the leapfrog integrator M steps using a time increment of 
6 to obtain the proposal 


3. Set = x'^lf with probability 


U) 


r = mm[l,exp{-H{x‘fj ,p‘f]) + H{x^f’ ,p'^f’)}], 
set j = j + I, and go to step until j = n. 

HMC has been shown to be particularly useful for densities of the kind we are sampling from 
here, as Hamilton’s equations prevent the proposals from escaping the energy well induced 
by the density. Note that when M = 1, HMC reduces to the Metropolis-adjusted Langevin 


hi) 


algorithm (Roberts and Tweedie, 1996). For an HMC algorithm performing M > 1 leapfrog 
steps, Neal (1994) introduced the idea of sampling I uniformly from the set {0,..., M} and 
running the leapfrog integrator I steps backwards and M — I steps forwards, which was 
further developed in Qin and Liu (2001). This introduces symmetry among the points on 


the leapfrog path, which allows us to construct a K that can be used in LWMCMC. 


Specifically, sample I as above and set xp'* = x. Note here that the random index i = 
I takes on the same meaning as the index i = 0 did earlier. In parallel, run leapfrog 























integration backward in time for I steps, generating ..., and forward for M — I 

steps, generating ..., }• The set of associated momentum vectors is 

Let K denote the measure that generates this proposal process. A challenge with this K 
is that we are not able to fully exploit the trivial parallelization potential of the algorithm, 
since the integrator is sequential in nature. 


By the symmetric sampling of I, the z/ —)■ cx) weights of the proposals 






w{x\^^) = 


For the same reason, the v = 1 weights are 


for i = 0,..., M. 


w(x 


1 

M 




exp{-g(xP.#)} l 


2=1 


Moreover, let a = 0 if / > M —/ and a = M otherwise. Take T be the measure that performs 
the usual HMC Metropolis step comparing the initial point x^p with y = Xa^. That is, 
accept Xa'^ with probability 

r = min[l, exp{—+ H{xl^\pl^^)}]. 


Table 1 summarizes the results of conventional HMC against LWMCMC HMC and how 
ESS scales with M, with n = 1, 000, 5 = 0.05 and W = I 2 . We also compare LWMCMC 
to Calderhead’s algorithm for different values of M and the resampling parameter N. See 
algorithm A.l and remarks A.l and A.2 of the appendix for further details and compar¬ 
isons between the algorithms. We apply the same measure of effective sample size, noting 
that Calderhead’s algorithm induces a weighting scheme as mentioned in remark [A. 2 The 
improvement in ESS of LWMCMC against Calderhead stems both from choosing a more 
useful T and from the Rao-Blackwellization discussed in section 2.3 The improvements in 


LWMCMC and Calderhead’s method as M increases are due to the decreasing variance of 
X. Note also that we are observing super-efficiency in the HMC sampling scheme, where 
negative correlation among the samples lead to ESS greater than the number of samples 
drawn. 


Figures 4 and 5 display the HMC and LWMCMC HMC samples obtained when M = 60. 
In Figure 5, the black dots represent all the positions visited in the leapfrog integration, 
and hence are the weighted samples produced by K. The red dots represent the samples 
produced by T, i.e. the propagating samples. 


5 Conclusions 

In this paper we have proposed the locally weighted Markov chain Monte Carlo algorithm 
(LWMCMC), which dominates its parallel MCMC counterpart and typically improves upon 
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HMC samples 


LWMCMC HMC samples 



z z 


Figure 4: HMC. Figure 5: LWMCMC HMC. 

Effective Sample Size (ESS) 



LWMCMC HMC 

HMC 

Calderhead HMC, u ^ oo 

M 

z/ = 1 

u ^ oc 


N = 1 

10 

50 

200 

1000 

5 

39 

39 

26 

7.6 

8.0 

8.0 

8.0 

8.0 

30 

624 

625 

678 

158 

185 

202 

203 

203 

60 

2,468 

2,483 

1,137 

440 

1,333 

1,544 

1,525 

1,512 

90 

4,773 

4,792 

938 

705 

2,638 

3,103 

3,275 

3,282 

240 

9,613 

9,681 

1,061 

916 

3,312 

5,207 

6,235 

6,333 


Table 1: ESS for HMC, LWMCMC HMC, and Calderhead at different values of N. 


standard MCMC. We show how to compute the effective sample size of the LWMCMC output 
and illustrate its performance on a toy example. The LWMCMC algorithm is well suited 
to modern computer architectures with massive numbers of cores, which can dramatically 
increase computational efficiency. 


A Proofs and supplementary material 

A.l Proof of Theorem 12.11 

Theorem 2 . 1 \ The estimators produced by Algorithm 2.1 are unbiased for 


Before proving the theorem, we prove the following lemma. 


Lemma A.l. Samples obtained according to Algorithm A.l given below are draws from tt. 


Algorithm A.l. Set to be the initial value and set j = 1. Collect points : i = 
1,... ,N;j = 1,... n} according to the steps: 
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1. Draw proposals ... ,x^^} from K{dxi,... ,dxM] Xq'^) . 

2. Sample N points {yi \ ..., (with replacement) from , x^^} with probabil¬ 

ities w{xl^'*). 


3. Draw y from T{dy, XQ\x^f\ ..., x^^). 

4- Set Xq^^"' = y and j = j + 1, and go to step 1 until j = n. 
5. Estimate yn with fih = :^ EJ=i E*=i 


Proof of Lemma \A.l\ {x[/^ : j = 1,..., n} is a standard MCMC sampler by construction of 
the transition kernel T. Since Calderhead’s algorithm is valid for versions 1 and 2 of the 
weighting scheme, we know step^ draws samples from vr given that Xq'^ follows tt. Combining 
these two arguments we establish that the samples {y^^ : i = 1,2,..., N; j = 1, 2,... n} all 
have marginal distribution tt. □ 


Proof of Theorem 2^. Let be a point drawn according to Algorithm 
..., x^^}, so that 


A.l 


Let = 


M 


h/x = E 


(i)' 


= E 


E 


Uh 


X 


U) 




E<j 

i=0 


□ 


A.2 Proof of Theorem 12.21 and remarks 

Theorem 2.2|, Given the same weighting scheme, Algorithm 2^ is a Rao-Blackwellization 


of Algorithm A.l 


Proof. Let x^^'> = ■■■^x^fl} and = {y[^\ ...,y^]f^}. 


n N 


j=i i=i 


Uh 


N 


Var <1 \ Gov <1 ^ h{yl^^), ^ h{yl 




N N 


i=i 


i=l 


n2N^ 


j<k 


2=1 


2=1 


By the law of total variance, for the hrst of these terms we have 


N 


^ E Var i ^ /z(2/p)) i > ^ 


yiN2 


i=i 


i=l 


i=i 


N 




i=0 


M 


m 


Var I 'y^ w{x'f^)h{x^^^'' 


i=i 


i=0 
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For the second, we will show that 


N N 

,U)\ 


M 


M 




^ Cov ^ h{yy’), Hvi 


j<k 


2 = 1 


2=1 




j<k 


^Cov <1 \ ■ 

2=0 2=0 


Assume j < k without loss of generality. By the law of total covariance we then have 


N 


N 




(fc)^ 


2 = 1 


N 


= Cov 


N 


N 




2 = 1 


2 = 1 


N 


N 


•v® Emk 


(k)-. 


2=1 


^(j) ^{k) 


E 


N N 

2=1 


M 


M 


2=1 

Cov <1 ''^w{x\^^)h{x[^'’), w{x\’^^)h{x[^^'' 

2 = 0 2=0 


^U) ^{k) 


where the last equality follows from the conditional independence structure of from all 
the other samples and proposals given Summarizing these results, we can conclude that 


n N 


n M 


Var 


Y1 Y1 j ^ w{xl^'’)h{xl^^) 




nN 


j=l i=l 


j=l i=0 


□ 


Remark A.l. Hence, with the same amount of computational time, but less memory and 
no resampling step, we are able to do better than Algorithm A.l[ The proof of Theorem 
12.21 illuminates where the reduction in variance occurs. This indicates that the method is 
still sensitive to the properties of the Markov chain used to propagate the sample space. In 
particular, our method will have the same degree of stickiness as Algorithm A.l[ 


Remark A.2. Algorithm 


A.l 


number of times the proposa 
rO) /AT / at(R 


also induces a weighting scheme. Specihcally, if is the 


X 


U) 


is resampled out of the N resampled points, the weights 
are /N. Note that {Nq\ ..., ~ Multinomial[A^, {w(a;o'^\ ..., w{x^^)}]. Thus, both 

Calderhead’s algorithm and |A.l attempt to encode the information about this multinomial 
distribution using an empirical distribution. This creates a loss of information, as proved in 


Theorem 2.2 The Dvoretszky-Kiefer-Wolfowitz inequality (Dvoretszky et ah, 1956) provides 


probability bounds for how close the empirical CDF of the resampled points is to the actual 
CDF as a function of N. This can give some indication of how large N would have to be 
chosen for Algorithm |A.1| and Calderhead’s algorithm to approximate LWMCMC well. As 
N ^ oo, Algorithm |A. 1| converges to LWMCMC. 
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A.3 Derivation of ESS for LWMCMC 


Proposition |3.1| The ESS for samples and weights on the form produced by Algorithm 2.1 
can he written as 

ESS = 


n 


where 7 ^ is the lag-k autocorrelation function of and Var(a;) = Var(a;*^-^^) for all j 

by stationarity. 

Proof. 


a 


ESS 




^(i) i _ 


i=i 


n 


11 9 "■ 

Var(a;^-^^) H —- ^ Co'v{E^\E^^) 


n—1 


—Var(a;) H— (1 -) 7 fcVar(x) 

n n \ n 


Var(a;) 


k=l 

n—1 


i=i 


j<k 


n 


1+2E 1 


k=l 


n 


Ik 


where the second inequality follows from stationarity. Recall that by the Cesaro summability 
theorem 

lim ^ ("l - 7 fc = ^ 7 fc- 

n^oo V Ti / 

k=l ^ k 


For sufficiently large n, we therefore substitute the right hand side of this equality into the 
expressions derived above. Rearranging the terms will give the desired result. □ 
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